Distant Supervision for Tweet Classification Using YouTube Labels
نویسندگان
چکیده
We study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another. In particular, we apply classes assigned to YouTube videos to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The experiments we have run show that a tweet classifier trained via these automatically labelled data substantially outperforms an analogous classifier trained with a limited amount of manu-
منابع مشابه
Simple Queries as Distant Labels for Predicting Gender on Twitter
The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confir...
متن کاملSentiment Analysis using Deep Convolutional Neural Networks with Distant Supervision
This thesis addresses the problem of predicting message-level sentiments of English micro-blog messages from Twitter. Convolutional neural networks (CNN) have shown great promise in the task of sentiment classification. Here we expand the CNN proposed by [31, 32] and perform an in-depth analysis to deepen the understanding of these systems. In a first step we compare the performance of differen...
متن کاملUsing millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction o...
متن کاملReducing Wrong Labels in Distant Supervision for Relation Extraction
In relation extraction, distant supervision seeks to extract relations between entities from text by using a knowledge base, such as Freebase, as a source of supervision. When a sentence and a knowledge base refer to the same entity pair, this approach heuristically labels the sentence with the corresponding relation in the knowledge base. However, this heuristic can fail with the result that s...
متن کاملAn Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform
Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multilabel video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio &...
متن کامل